其他
来自本科生的暴击:清华开源「天授」强化学习平台,纯PyTorch实现
本文经机器之心(微信公众号:almosthuman2014)授权转载,禁止二次转载
项目作者:thu-ml
参与:思、肖清
训练模型的极速,与 1500 行源代码的精简,清华大学新开源强化学习平台「天授」。值得注意的是,该项目的两位主要作者目前都是清华大学的本科生。
Policy Gradient (PG)
Deep Q-Network (DQN)
Double DQN (DDQN) with n-step returns
Advantage Actor-Critic (A2C)
Deep Deterministic Policy Gradient (DDPG)
Proximal Policy Optimization (PPO)
Twin Delayed DDPG (TD3)
Soft Actor-Critic (SAC)
result = collector.collect(n_step=n)
result = policy.learn(collector.sample(batch_size))
__init__:初始化策略
process_fn:从 replay buffer 中处理数据
__call__:给定环境观察结果计算对应行动
learn:给定批量数据学习策略
!git clone https://github.com/thu-ml/tianshou
!pip3 install tianshou
import os
os.chdir('tianshou')
!python test/discrete/test_pg.py
!python test/discrete/test_ppo.py
!python test/discrete/test_a2c.py
!python test/discrete/test_dqn.py
Prioritized replay buffer
RNN support
Imitation Learning
Multi-agent
Distributed training